Overview (mixed models essentials and subject-matter intro)

Mixed models essentials

What are mixed models good for?

a broad class of statistical models that extend linear and generalized linear models to handle data where observations are measured within discrete groups such as field sites; years or other temporal blocks; individuals that are observed multiple times; genotypes; species; etc. They can be thought of (equivalently) as (1) accounting for the correlation among observations from the same group; (2) estimating the variability among groups, or (3) parsimoniously estimating the effects of groups. They are most useful when the experimental or observational design includes a large number of groups with varying numbers of observations per group.

What is a random effect anyway?

  • A grouping variable g (must be discrete!) and a varying term f
  • denoted as (f | g) in most R MM packages
  • "the effect of f varies across groups defined by g (f = 1 → intercept)
  • effects of f for each group g estimated by shrinkage (empirical Bayes, joint Bayesian prior, …)

When should you use a random effect?

  • don’t want to test hypotheses about differences between groups
  • do want to quantify the variability across groups
  • do want to make predictions for unobserved groups
  • do want to combine information across groups
  • do have variation in information per group (samples, noise)
  • do have groups randomly sampled from a population
  • do have a categorical nuisance variable
  • do have exchangeable groups
  • do have “many” (> 5-6) groups; small \(n\) per group; unbalanced groups

cf. Crawley (2002), Gelman (2005)

What tools?

  • Mostly lme4; gamm4 to deal with spatial autocorrelation
  • Some ggplot2, maybe some tidyverse
  • diagnostics etc.; broom.mixed, DHARMa, car (using :: notation as appropriate)
  • Other (G)LMM-adjacent packages listed here

Science intro

What is this project?

  • Paper in progress/review with Max Moritz (UCSB) and Enric Batllori Presas (Univ Barcelona)
  • Use synthetic global-scale databases of species richness, primary productivity, wildfire to quantify relationships between (NPP, fire consumption) and richness

Pictures?

Analytical goals?

  • estimate effects of NPP (g C m\(^2\)/year), fire consumption (% of NPP), interannual CV of NPP and fire consumption, and their interactions, on species richness
  • at the global level and possibly variation across different geographic scales

need for mixed models: geographic variation

  • realms (large-scale) (e.g. “Neotropics”)
  • biomes (medium-scale, environmental) (e.g. “tropical grassland”)
  • biome × realm interaction (“tropical grasslands in the Neotropics”)
  • “ecoregion”: sampling unit (Olson et al. 2001)

nesting and crossing of random effects

Nested: sub-unit IDs only measured within a single larger unit. e.g.: Plot1 in Block1 independent of Plot1 in Block2

Crossed: sub-unit IDs can be measured in multiple larger units. e.g. year, site

Unique coding: removes ambiguity

Robert Long, Cross Validated

random effects terms

  • “random effect of X” usually means intercept variation only, by default (but see Schielzeth and Forstmeier (2009))
    • one parameter (variance/std dev of intercept across groups)
  • RE terms with n parameters (intercept + slope = 2)
    • n*(n+1)/2 parameters
    • 10 for this example
  • RE terms with n independent effects (|| shortcut); only n parameters
    • 4 for this example

more preliminaries

  • we’ll work with log-scaled NPP/fire, raw CVs, all centered (Schielzeth 2010)
  • effects all evaluated at geometric mean of other variables
  • coefficients are approximately elasticities

finally, before we start …

knitr::include_graphics("pix/gelman_hill_complexity.png")

Gelman and Hill (2006)


knitr::include_graphics("pix/uriarte_yackulic_complexity.png")

Uriarte and Yackulic (2009)

Coding

load packages

Load packages up front, note what they’re used for …

library(tidyverse); theme_set(theme_bw())
library(lme4)
library(gamm4)
## diagnostics
library(DHARMa)
library(car) ## influencePlot
## extraction/graphics
library(broom.mixed)
library(dotwhisker)
library(gridExtra)
## from GitHub: see utils.R
library(r2glmm) ## bbolker/r2glmm
library(remef)  ## hohenstein/remef
source("utils.R")
source("gamm4_utils.R")

load data

dd <- readRDS("data/ecoreg.rds")

simplest model

Single-level model (biomes), intercept variation only. All pairwise interactions of main variables ((...)^2), plus (log of) ecoregion area:

m1 <- lmer(mbirds_log ~ log(area_km2) + (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc)^2 +
             (1 | biome),
           data = dd)
## may get
## Warning message:
## Some predictor variables are on very different scales: consider rescaling 

diagnostics

Best practice: check diagnostics as early as possible (before summary(), coeff plots) to reduce snooping.

plot(m1, type = c("p", "smooth"))
## heteroscedasticity
plot(m1, sqrt(abs(resid(.))) ~ fitted(.), type = c("p", "smooth"))
car::influencePlot(m1)
plot(DHARMa::simulateResiduals(m1))
## basic coefficient plot
dotwhisker::dwplot(m1, effects="fixed") + geom_vline(xintercept = 0, lty = 2)

## ordered coefficient plot (from utils.R)
dwplot_ordered(m1, effects = "fixed")

add RE terms

  • allow the main effects to vary across biomes (in a correlated way)
  • update() is your friend
m2 <- update(m1, . ~ . - (1|biome) + (1 + Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | biome))
## compare plots
dwplot_ordered(list(intercept_only = m1, full = m2), effects = "fixed")

three-level model

Now go to the (almost) maximal model; variation of main effects at all three geographic levels

## ~ 30 seconds
max_model <- lmer(mbirds_log ~ log(area_km2) + (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc)^2 +
                    (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | biome) +
                    (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | flor_realms) +
                    (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | biome_FR),
                  data = dd,
                  ## for speed/skip convergence warnings
                  control = lmerControl(calc.derivs = FALSE))

checking the covariance parameter estimates

isSingular(max_model)
## [1] TRUE
lwr <- getME(max_model, "lower"); theta <- getME(max_model, "theta"); min(theta[lwr == 0])
## [1] 0
VarCorr(max_model)
##  Groups      Name        Std.Dev. Corr                       
##  biome_FR    (Intercept) 0.108507                            
##              Feat_log_sc 0.028953 -0.998                     
##              Feat_cv_sc  0.058037  0.968 -0.975              
##              NPP_log_sc  0.112513 -0.011 -0.024  0.240       
##              NPP_cv_sc   0.036552 -0.987  0.991 -0.996 -0.148
##  biome       (Intercept) 0.126920                            
##              Feat_log_sc 0.011041 -1.000                     
##              Feat_cv_sc  0.050648  0.937 -0.937              
##              NPP_log_sc  0.148073 -0.467  0.467 -0.748       
##              NPP_cv_sc   0.091318 -0.619  0.619 -0.855  0.984
##  flor_realms (Intercept) 0.229506                            
##              Feat_log_sc 0.130446  0.561                     
##              Feat_cv_sc  0.072527  0.396  0.442              
##              NPP_log_sc  0.118908 -0.360 -0.965 -0.514       
##              NPP_cv_sc   0.031674  0.246 -0.662 -0.095  0.792
##  Residual                0.193242

maximal model

model simplification

  • avoid singularity/non-convergence (Barr et al. 2013; Schielzeth and Forstmeier 2009)
  • data-driven (AIC, p-value) (Bates et al. 2015; Matuschek et al. 2017)

non-convergence vs singularity

  • convergence warnings: historical reasons
  • very unreliable (and slow!) for large data sets (>10,000 observations)
  • gold standard: run allFit(), diagnose/evaluate differences in effects of interest

AIC table/strategy

  • for loop over table
  • fitting many models is a code smell but sometimes you can’t avoid it
  • reformulate() is useful
all_vars <- "1 + Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc"
v1 <- expand.grid(c("1 | ", paste(all_vars, "|"), paste(all_vars, "||")),
                  c("biome", "flor_realms", "biome_FR"))
v2 <- sprintf("(%s)", apply(v1, 1, paste, collapse = " "))
## use cross_df instead of expand.grid, want chars
v3 <- cross_df(list(biome = v2[1:3], FR = v2[4:6], biome_FR = v2[7:9]))
v3[1,]
## # A tibble: 1 × 3
##   biome        FR                 biome_FR       
##   <chr>        <chr>              <chr>          
## 1 (1 |  biome) (1 |  flor_realms) (1 |  biome_FR)
model_list <- list()
p1 <- proc.time()
for (i in 1:nrow(v3)) {
  cat(i, unlist(v3[i,]), "\n")
  form <- reformulate(
      c(sprintf("(%s)^2", all_vars), ## main effects and 2-way interax
        "log(area_km2)",             ## plus ecoregion area
        unlist(v3[i,])),             ## plus specified REs
      response = "mbirds_log")
  model_list[[i]] <- lmer(form, data = dd) ## fit model
}
saveRDS(model_list, file = "data/model_list.rds")
proc.time() - p1

extract summary info from models, take a look

model_list <- readRDS("data/model_list.rds")
aic_vec <- sapply(model_list, AIC)
is_sing <- sapply(model_list, isSingular)
conv_warn <- sapply(model_list, has_warning)
tibble(model=1:27, aic_vec, is_sing, conv_warn) %>% arrange(aic_vec)
## # A tibble: 27 × 4
##    model aic_vec is_sing conv_warn
##    <int>   <dbl> <lgl>   <lgl>    
##  1    19    35.7 FALSE   FALSE    
##  2    16    38.6 TRUE    FALSE    
##  3    25    39.7 TRUE    FALSE    
##  4    10    40.9 TRUE    FALSE    
##  5    21    43.2 TRUE    FALSE    
##  6    27    46.2 TRUE    FALSE    
##  7    18    46.8 TRUE    FALSE    
##  8    12    48.4 TRUE    FALSE    
##  9     9    49.5 FALSE   FALSE    
## 10    13    51.2 FALSE   TRUE     
## # … with 17 more rows

find best-AIC, non-singular model

best_index <- which(aic_vec == min(aic_vec) & !is_sing & !conv_warn)
best_model <- model_list[[best_index]]

check it out

best_model
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## mbirds_log ~ (1 + Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc)^2 +  
##     log(area_km2) + (1 | biome) + (1 | flor_realms) + ((1 | biome_FR) +  
##     (0 + Feat_log_sc | biome_FR) + (0 + Feat_cv_sc | biome_FR) +  
##     (0 + NPP_log_sc | biome_FR) + (0 + NPP_cv_sc | biome_FR))
##    Data: dd
## REML criterion at convergence: -4.3321
## Random effects:
##  Groups      Name        Std.Dev.
##  biome_FR    NPP_cv_sc   0.07758 
##  biome_FR.1  NPP_log_sc  0.13922 
##  biome_FR.2  Feat_cv_sc  0.08691 
##  biome_FR.3  Feat_log_sc 0.05442 
##  biome_FR.4  (Intercept) 0.13560 
##  biome       (Intercept) 0.10908 
##  flor_realms (Intercept) 0.23103 
##  Residual                0.19400 
## Number of obs: 620, groups:  biome_FR, 55; biome, 14; flor_realms, 6
## Fixed Effects:
##            (Intercept)             Feat_log_sc              Feat_cv_sc  
##               5.714624                0.091608               -0.006491  
##             NPP_log_sc               NPP_cv_sc           log(area_km2)  
##               0.260378                0.066486               -0.029917  
## Feat_log_sc:Feat_cv_sc  Feat_log_sc:NPP_log_sc   Feat_log_sc:NPP_cv_sc  
##              -0.011307               -0.051931               -0.047096  
##  Feat_cv_sc:NPP_log_sc    Feat_cv_sc:NPP_cv_sc    NPP_log_sc:NPP_cv_sc  
##               0.013720               -0.036622                0.006426

diagnostics

basics

plot(best_model, type = c("p", "smooth"))
## heteroscedasticity
plot(best_model, sqrt(abs(resid(.))) ~ fitted(.), type = c("p", "smooth"))
car::influencePlot(best_model)
plot(sr <- simulateResiduals(best_model))

What’s going on with DHARMa?

Computes residuals at population level (usually a good default but not necessarily appropriate here?)

DHARMa::plotResiduals(sr, dd$NPP_log_sc)

pop_resids <- model.response(model.frame(best_model)) - predict(best_model, re.form=NA)
p1 <- lattice::xyplot(pop_resids ~ dd$NPP_log_sc, cex = 1, type = c("p", "smooth"),
                      main = "pop resids (DHARMa)")
p2 <- plot(best_model, resid(.) ~ NPP_log_sc, type=c("p", "smooth"),
           main = "unit resids (lme4)")
gridExtra::grid.arrange(p1, p2, nrow = 1)

spatial correlation

Easiest to explore spatial correlations graphically:

dd$res1 <- residuals(best_model)
ggplot(dd, aes(x, y, colour = res1, size = abs(res1))) +
  geom_point() +
  scale_colour_gradient2() +
  scale_size(range=c(2,7))

  • Can see sub-realm geographic variation
  • Could work on spatial variograms, Moran’s I statistic, etc..

refit with gamm4

Quickest (not necessarily best) fix for spatial autocorr; bs="sos" is a spherical smooth

gamm4_form  <- update(formula(best_model), . ~ . + s(y, x, bs="sos"))
best_gamm4 <- gamm4(formula = nobars(gamm4_form),
                    random = as.formula(reOnly(gamm4_form)),
                    data = dd)
class(best_gamm4) <- c("gamm4", "list") ## so we can tidy etc. (using gamm4_utils.R)

compare results

dwplot_ordered(list(best_gamm4, best_model), effects="fixed")

display/description

coefficient plots

  • with dotwhisker::dwplot() as shown before, or broom.mixed::tidy() + ggplot tweaking
  • scaled or unscaled coefficients?

\(R^2\) and partial \(R^2\) values

rsqvals <- r2glmm::r2beta(best_model, method = "sgv", partial = TRUE)
## set (reverse)factor order (or forcats::fct_inorder())
rsqvals$Effect <- factor(rsqvals$Effect, levels = rev(rsqvals$Effect))
ggplot(rsqvals, aes(x=Rsq, xmin = lower.CL, xmax = upper.CL, y = Effect)) +
  geom_pointrange()

prettier version

prediction plots

NPPvec <- with(dd, seq(min(NPP_log_sc), max(NPP_log_sc), length.out = 51))
pred <- emmeans::emmeans(best_model, ~NPP_log_sc, at = list(NPP_log_sc = NPPvec))
## NOTE: Results may be misleading due to involvement in interactions
gg0 <- ggplot(as.data.frame(pred),
              aes(NPP_log_sc, emmean)) + geom_line() +
  geom_ribbon(aes(ymin=lower.CL, ymax = upper.CL), alpha = 0.2, colour = NA)
gg1 <- gg0 + geom_point(data = dd, aes(y = mbirds_log, colour = biome))
print(gg1)

Can also back-transform, unscale, etc. (see e.g. Stack Overflow Q1, Stack Overflow Q2)

partial residuals

Compute partial residuals and display:

dd$remef <- remef::remef(best_model, fix = "NPP_log_sc")
grid.arrange(gg1,
             gg0 + geom_point(data = dd, aes(y = remef, colour = biome)) +
             scale_y_continuous(limits = range(dd$mbirds_log)),
             nrow=1)

random effects

  • Can extract, plot, etc. examine random effects
  • Built-in dotplot, qqmath methods
  • Or use as.data.frame(ranef()) or broom.mixed::tidy(model, effects = "ran_vals")
  • For this problem, REs were interesting but too noisy to draw conclusions from
grid.arrange(grobs = lattice::dotplot(ranef(best_model)), nrow=1)

extras

further reading

  • Gelman and Hill (2006); McElreath (2015); Bolker (2015) (see worked examples); Wood (2017)
  • with caveats: Zuur et al. (2009) (and others); Bolker et al. (2009)

GLMMs

  • the entire layer of GLMMs can be added
  • standard families (Poisson/Gamma/binomial etc.) via glmer
  • zero-inflation, Tweedie, neg binom … glmmTMB, brms
  • more complex machinery (Gauss-Hermite quadrature, GLMMadaptive: or go Bayesian)
  • observation-level random effects

Bayesian approaches

  • slower but you get more
  • regularization is easy
  • informative priors
  • better handling of uncertainty, especially for complex models

regularization

  • if you want to fit a very complex model but prevent singular fits etc., you need to regularize the model somehow (i.e. force it to be sensible)
  • this is easiest to do in a Bayesian or semi-Bayesian framework
  • blme, MCMCglmm, rstanarm, brms

model simplification/covariance structures

beyond intercept-only and diagonal structures

  • Barr et al. (2013) also discusses slope-only models
  • compound symmetry (all-equal correlations)
    • (1|g/f) instead of (f|g) for factor terms
    • cs() in glmmTMB, other possibilities in nlme::lme, MCMCglmm, brms, …
  • factor-analytic models (Maeve McGillycuddy, UNSW)

autocorrelation

  • mgcv: Markov random field, etc.
  • brms: can use all mgcv smooths
  • INLA: probably best for spatial/temporal, but a whole other world
  • glmmTMB: simple space/time stuff with complex GLMMs

REML vs ML

  • not as important as people think?
  • analogous to dividing by \(n-1\) instead of \(n\) when computing variance
  • REML better for RE covariance estimates
  • ML required for model comparison

mixed-model ecosystem in R

references

references

Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.

Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv:1506.04967 [Stat], June. http://arxiv.org/abs/1506.04967.

Bolker, Benjamin M. 2015. “Linear and Generalized Linear Mixed Models.” In Ecological Statistics: Contemporary Theory and Application, edited by Gordon A. Fox, Simoneta Negrete-Yankelevich, and Vinicio J. Sosa. Oxford University Press.

Bolker, Benjamin M., Mollie E. Brooks, Connie J. Clark, Shane W. Geange, John R. Poulsen, M. Henry H. Stevens, and Jada-Simone S. White. 2009. “Generalized Linear Mixed Models: A Practical Guide for Ecology and Evolution.” Trends in Ecology & Evolution 24: 127–35. https://doi.org/10.1016/j.tree.2008.10.008.

Crawley, Michael J. 2002. Statistical Computing: An Introduction to Data Analysis Using S-PLUS. John Wiley & Sons.

Gelman, Andrew. 2005. “Analysis of Variance: Why It Is More Important Than Ever.” Annals of Statistics 33 (1): 1–53. https://doi.org/doi:10.1214/009053604000001048.

Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge, England: Cambridge University Press. http://www.stat.columbia.edu/~gelman/arm/.

Matuschek, Hannes, Reinhold Kliegl, Shravan Vasishth, Harald Baayen, and Douglas Bates. 2017. “Balancing Type I Error and Power in Linear Mixed Models.” Journal of Memory and Language 94: 305–15. https://doi.org/10.1016/j.jml.2017.01.001.

McElreath, Richard. 2015. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Boca Raton: Chapman; Hall/CRC.

Olson, D. M., E. Dinerstein, E. D. Wikramanayake, N. D. Burgess, G. V. Powell, E. C. Underwood, J. A. D’amico, et al. 2001. “Terrestrial Ecoregions of the World: A New Map of Life on Earth: A New Global Map of Terrestrial Ecoregions Provides an Innovative Tool for Conserving Biodiversity.” BioScience 51 (11): 933–38.

Schielzeth, Holger. 2010. “Simple Means to Improve the Interpretability of Regression Coefficients.” Methods in Ecology and Evolution 1: 103–13. https://doi.org/10.1111/j.2041-210X.2010.00012.x.

Schielzeth, Holger, and Wolfgang Forstmeier. 2009. “Conclusions Beyond Support: Overconfident Estimates in Mixed Models.” Behavioral Ecology 20 (2): 416–20. https://doi.org/10.1093/beheco/arn145.

Uriarte, María, and Charles B. Yackulic. 2009. “Preaching to the Unconverted.” Ecological Applications 19 (3): 592–96. http://www.jstor.org/stable/27646001.

Wood, Simon. 2017. Generalized Additive Models: An Introduction with R. 2d ed. CRC Texts in Statistical Science. Chapman & Hall.

Zuur, Alain F., Elena N. Ieno, Neil J. Walker, Anatoly A. Saveliev, and Graham M. Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. Springer.